38 research outputs found

    VM-MAD: a cloud/cluster software for service-oriented academic environments

    Full text link
    The availability of powerful computing hardware in IaaS clouds makes cloud computing attractive also for computational workloads that were up to now almost exclusively run on HPC clusters. In this paper we present the VM-MAD Orchestrator software: an open source framework for cloudbursting Linux-based HPC clusters into IaaS clouds but also computational grids. The Orchestrator is completely modular, allowing flexible configurations of cloudbursting policies. It can be used with any batch system or cloud infrastructure, dynamically extending the cluster when needed. A distinctive feature of our framework is that the policies can be tested and tuned in a simulation mode based on historical or synthetic cluster accounting data. In the paper we also describe how the VM-MAD Orchestrator was used in a production environment at the FGCZ to speed up the analysis of mass spectrometry-based protein data by cloudbursting to the Amazon EC2. The advantages of this hybrid system are shown with a large evaluation run using about hundred large EC2 nodes.Comment: 16 pages, 5 figures. Accepted at the International Supercomputing Conference ISC13, June 17--20 Leipzig, German

    UBIDEV: a homogeneous service framework for pervasive computing environments

    Get PDF
    This dissertation studies the heterogeneity problem of pervasive computing system from the viewpoint of an infrastructure aiming to provide a service-oriented application model. From Distributed System passing through mobile computing, pervasive computing is presented as a step forward in ubiquitous availability of services and proliferation of interacting autonomous entities. To better understand the problems related to the heterogeneous and dynamic nature of pervasive computing environments, we need to analyze the structure of a pervasive computing system from its physical and service dimension. The physical dimension describes the physical environment together wit the technology infrastructure that characterizes the interactions and the relations within the environment; the service dimension represents the services (being them software or not) the environment is able to provide [Nor99]. To better separate the constrains and the functionalities of a pervasive computing system, this dissertation classifies it in terms of resources, context, classification, services, coordination and application. UBIDEV, as the key result of this dissertation, introduces a unified model helping the design and the implementation of applications for heterogeneous and dynamic environments. This model is composed of the following concepts: • Resource: all elements of the environment that are manipulated by the application, they are the atomic abstraction unit of the model. • Context: all information coming from the environment that is used by the application to adapts its behavior. Context contains resources and services and defines their role in the application. • Classification: the environment is classified according to the application ontology in order to ground the generic conceptual model of the application to the specific environment. It defines the basic semantic level of interoperability. • Service: the functionalities supported by the system; each service manipulates one or more resources. Applications are defined as a coordination and adaptation of services. • Coordination: all aspects related to service composition and execution as well as the use of the contextual information are captured by the coordination concept. • Application Ontology: represents the viewpoint of the application on the specific context; it defines the high level semantic of resources, services and context. Applying the design paradigm proposed by UBIDEV, allows to describe applications according to a Service Oriented Architecture[Bie02], and to focus on application functionalities rather than their relations with the physical devices. Keywords: pervasive computing, homogenous environment, service-oriented, heterogeneity problem, coordination model, context model, resource management, service management, application interfaces, ontology, semantic services, interaction logic, description logic.Questa dissertazione studia il problema della eterogeneit`a nei sistemi pervasivi proponendo una infrastruttura basata su un modello orientato ai servizi. I sistemi pervasivi sono presentati come un’evoluzione naturale dei sistemi distribuiti, passando attraverso mobile computing, grazie ad una disponibilit`a ubiqua di servizi (sempre, ovunque ed in qualunque modo) e ad loro e con l’ambiente stesso. Al fine di meglio comprendere i problemi legati allintrinseca eterogeneit`a dei sistemi pervasivi, dobbiamo prima descrivere la struttura fondamentale di questi sistemi classificandoli attraverso la loro dimensione fisica e quella dei loro servizi. La dimensione fisica descrive l’ambiente fisico e tutti i dispositivi che fanno parte del contesto della applicazione. La dimensione dei servizi descrive le funzionalit`a (siano esse software o no) che l’ambiente `e in grado di fornire [Nor99]. I sistemi pervasivi vengono cos`ı classificati attraverso una metrica pi `u formale del tipo risorse, contesto, servizi, coordinazione ed applicazione. UBIDEV, come risultato di questa dissertazione, introduce un modello uniforme per la descrizione e lo sviluppo di applicazioni in ambienti dinamici ed eterogenei. Il modello `e composto dai seguenti concetti di base: • Risorse: gli elementi dell’ambiente fisico che fanno parte del modello dellapplicazione. Questi rappresentano l’unit`a di astrazione atomica di tutto il modello UBIDEV. • Contesto: le informazioni sullo stato dell’ambiente che il sistema utilizza per adattare il comportamento dell’applicazione. Il contesto include informazioni legate alle risorse, ai servizi ed alle relazioni che li legano. • Classificazione: l’ambiente viene classificato sulla base di una ontologia che rappresenta il punto di accordo a cui tutti i moduli di sistema fanno riferimento. Questa classificazione rappresenta il modello concettuale dell’applicazione che si riflette sull’intero ambiente. Si definisce cos`ı la semantica di base per tutto il sistema. • Servizi: le funzionalit`a che il sistema `e in grado di fornire; ogni servizio `e descritto in termini di trasformazione di una o pi `u risorse. Le applicazioni sono cos`ı definite in termini di cooperazione tra servizi autonomi. • Coordinazione: tutti gli aspetti legati alla composizione ed alla esecuzione di servizi cos`ı come l’elaborazione dell’informazione contestuale. • Ontologia dell’Applicazione: rappresenta il punto di vista dell’applicazione; definisce la semantica delle risorse, dei servizi e dell’informazione contestuale. Applicando il paradigma proposto da UBIDEV, si possono descrivere applicazioni in accordo con un modello Service-oriented [Bie02] ed, al tempo stesso, ridurre l’applicazione stessa alle sue funzionalit`a di alto livello senza intervenire troppo su come queste funzionalit` a devono essere realizzate dalle singole componenti fisiche

    GridCertLib: A Single Sign-on Solution for Grid Web Applications and Portals

    Get PDF
    This paper describes the design and implementation of GridCertLib, a Java library leveraging a Shibboleth-based authentication infrastructure and the SLCS online certificate signing service, to provide short-lived X.509 certificates and Grid proxies. The main use case envisioned for GridCertLib, is to provide seamless and secure access to Grid X.509 certificates and proxies in web applications and portals: when a user logs in to the portal using SAML-based Shibboleth authentication, GridCertLib uses the SAML assertion to obtain a Grid X.509 certificate from the SLCS service and generate a VOMS proxy from it. We give an overview of the architecture of GridCertLib and briefly describe its programming model. Its application to some deployment scenarios is outlined, as well as a report on practical experience integrating GridCertLib into portals for Bioinformatics and Computational Chemistry applications, based on the popular P-GRADE and Django software

    gcodeml: A Grid-enabled Tool for Detecting Positive Selection in Biological Evolution

    Get PDF
    One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.Comment: 10 pages, 4 figures. To appear in the HealthGrid 2012 con

    SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

    Get PDF
    Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker node

    Towards a Swiss National Research Infrastructure

    Full text link
    In this position paper we describe the current status and plans for a Swiss National Research Infrastructure. Swiss academic and research institutions are very autonomous. While being loosely coupled, they do not rely on any centralized management entities. Therefore, a coordinated national research infrastructure can only be established by federating the various resources available locally at the individual institutions. The Swiss Multi-Science Computing Grid and the Swiss Academic Compute Cloud projects serve already a large number of diverse user communities. These projects also allow us to test the operational setup of such a heterogeneous federated infrastructure

    GridCertLib: a Single Sign-on Solution for Grid Web Applications and Portals

    Full text link
    This paper describes the design and implementation of GridCertLib, a Java library leveraging a Shibboleth-based authentication infrastructure and the SLCS online certificate signing service, to provide short-lived X.509 certificates and Grid proxies. The main use case envisioned for GridCertLib, is to provide seamless and secure access to Grid/X.509 certificates and proxies in web applications and portals: when a user logs in to the portal using Shibboleth authentication, GridCertLib can automatically obtain a Grid/X.509 certificate from the SLCS service and generate a VOMS proxy from it. We give an overview of the architecture of GridCertLib and briefly describe its programming model. Its application to some deployment scenarios is outlined, as well as a report on practical experience integrating GridCertLib into portals for Bioinformatics and Computational Chemistry applications, based on the popular P-GRADE and Django softwares.Comment: 18 pages, 1 figure; final manuscript accepted for publication by the "Journal of Grid Computing

    swissPIT: a novel approach for pipelined analysis of mass spectrometry data

    Get PDF
    The identification and characterization of peptides from tandem mass spectrometry (MS/MS) data represents a critical aspect of proteomics. Today, tandem MS analysis is often performed by only using a single identification program achieving identification rates between 10-50% (Elias and Gygi, 2007). Beside the development of new analysis tools, recent publications describe also the pipelining of different search programs to increase the identification rate (Hartler et al., 2007; Keller et al., 2005). The Swiss Protein Identification Toolbox (swissPIT) follows this approach, but goes a step further by providing the user an expandable multi-tool platform capable of executing workflows to analyze tandem MS-based data. One of the major problems in proteomics is the absent of standardized workflows to analyze the produced data. This includes the pre-processing part as well as the final identification of peptides and proteins. The main idea of swissPIT is not only the usage of different identification tool in parallel, but also the meaningful concatenation of different identification strategies at the same time. The swissPIT is open source software but we also provide a user-friendly web platform, which demonstrates the capabilities of our software and which is available at http://swisspit.cscs.ch upon request for account. Contact: [email protected]
    corecore